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Abstract. We study approximations of evolving probability measures by an inter- 
acting particle system. The particle system dynamics is a combination of independent 
Markov chain moves and importance sampling/resampling steps. Under global regu- 
larity conditions, we derive non-asymptotic error bounds for the particle system ap- 
proximation. In a few simple examples, including high dimensional product measures, 
bounds with explicit constants of feasible size are obtained. Our main motivation are 
applications to sequential MCMC methods for Monte Carlo integral estimation. 



1. Introduction 

1.1. Evolving probability measures. Let (iM)t£[o,oo) denote a family of mutually 
absolutely continuous probability measures on a set S. To keep the presentation as simple 
and non-technical as possible, we assume that S is finite. Motivated by Monte Carlo 
methods for sequential estimation of expectation values with respect to the probability 
measures fit (see e.g. [5, 9, 10, 19] and references therein), we will recall how to obtain 
Fokker-Planck type evolution equations on the space of probability measures on S that 
are satisfied by /it, and how to approximate these equations by interacting particle 
systems. The main purpose of this paper is to bound the error of the particle system 
approximations by an L p approach (see Theorems 2.5, 2.6 and 2.10 below). 

Sequential Monte Carlo (SMC) methods that combine Markov Chain Monte Carlo 
(MCMC) and Importance Sampling/Resampling methods to approximate a given se- 
quence (fit) of probability measures are used in a variety of applications, see for instance 
[7, 10, 34] and references therein. There is by now a substantial literature on approxi- 
mation properties of corresponding particle system discretizations, cf. [5, 9, 14] and the 
references cited below. Nevertheless, our mathematical understanding of SMC methods 
is still far more superficial than that of traditional MCMC methods, where, at least 
for some specific models, sharp bounds for mixing times, approximation errors and de- 
pendence on the dimension have been derived. The LP approach to controlling the 
approximation error that we propose here is a first step towards more quantitative re- 
sults that might be useful in particular in studying dimensional dependence. In contrast 
to most of the literature on SMC methods (see however [14, 35, 36]), we focus on the 
continuous time case. 
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We assume that the measures are represented in the form 

tH(x) = 4" exp(-U t (x)) fjt (x), t>0, (1.1) 

where Z t is a normalization constant, and (t, x) h-> Ut[x) is a given function on [0, oo) x 5 
that is continuously differentiable in the first variable. If, for example, Ut(x) = tU(x) 
for some function U : S — > R, then (fit)t>o is the exponential family corresponding to U 
and ^o- Let 



: = -^log^(x) = --log W 



dt dt fJ>o(x) 

denote the negative logarithmic time derivative of the measures fit- Note that 

Ht(x) = exp ^- J H s (x) ds^j fi (x) , (1.2) 

and 

(H t , /it) = ~j t Ht{S) = for all t > 0, (1.3) 

where 

</,!/):= /"/di/ = 

denotes the integral of a function / : S — > R w.r.t. a measure f on 5. In particular, 

In the applications we have in mind, the functions Ut are given explicitly. Hence Ht 
is known explicitly up to an additive time-dependent constant. The evaluation of this 
constant, however, would require computing an integral w.r.t. fit- 

If all the functions H t , t > 0, vanish then fit = fio for all t > 0. In this case the 
measures are invariant for a Markov transition semigroup (pt)t>o, i-e., 

HsPt-s = Ht for any t > s > 0, 

provided the generator C of (pi)t>o satisfies hqC = 0, i.e. 

^2 fio{x)£(x, y) = for any y G 5. 

This fact is exploited in Markov Chain Monte Carlo methods for approximating expec- 
tation values w.r.t. the measure fiQ. The particle systems studied below can be applied 
for the same purpose when the measures fit are time-dependent. 

1.2. Fokker-Planck equation and particle system approximation. To obtain ap- 
proximations of the measures fit, we consider generators (Q-matrices) £<, t > 0, of a 
time-inhomogeneous Markov process on S satisfying the detailed balance conditions 

fi t {x)C t (x, y) = fi t (y)C t (y, x) V t>0, x,y e S. (1.4) 

For example, C t could be the generator of a Metropolis dynamics w.r.t. fi t , i.e., 

£ t (x,y) = K t (x,y) ■ min ( 1 J for x / y, 
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Ct(x, x) = — Y2y^x £t(%, y), where the proposal matrix Kt is a given symmetric transition 
matrix on S. In the sequel we will use the notation L\\x to denote the adjoint action of 
the generator on a probability measure /x, i.e., 

By (1.4), CtiM = 0, i.e., 

(C t f, &) =0 for any / : S -> M and t > 0. 

We fix non- negative constants Xt, t > 0, such that 1 1— >■ X t is continuous. Since the state 
space 5 is finite, the measures /x< are the unique solution of the evolution equation for 
measures 

—v t = X t C* t u t - H t v t (1.5) 

with initial condition vq = hq. In general, solutions of (1.5) are not necessarily proba- 
bility measures, even if v$ is a probability measure. Therefore, we consider the equation 

d 

—ijt = X t C* t r] t - H t rjt + {H t ,r] t )r] t (1.6) 

satisfied by the normalized measures rjt = j^fs)- Note that, by (1.3), fit also solves (1.6). 
Moreover, if rjt is a solution of (1.6), then 



v t = exp ^- J (H s ,7] s ) ds j rjt 

is the unique solution of (1.5) with initial condition uq = r?o- 

The Fokker-Planck equation (1.6) is an evolution equation for probability measures 
which, in contrast to the unnormalized equation, is not modified by adding constants to 
the functions H t . We now introduce interacting particle systems that discretize the evo- 
lution equations (1.6) and (1.5). Consider right continuous time-inhomogeneous Markov 
processes (X^,¥), N G N, with state space S N and generators at time t given by 

N 



C?(p(xi, x N ) = X t ^2 C\ % '(p(xi, ...,x N ) 



=i 

N 



(1.7) 



Here x = (ari, . . . , xjv) G 5^ and 



_ J ^fc if 7^ i, 
c, if /c = i. 



Moreover, £J stands for the operator £ t applied to the i-th component of x. Thus 
the components X^, i = 1, . . . ,N, of the process move like independent Markov 
processes with generator XtCt and are occasionally replaced by components with a lower 
value of H t . Note that to compute the generator (and hence to simulate the Markov 
process) it is enough to know the functions H t up to an additive constant. 

Discretizations of interacting particle systems of a similar type are widely used in 
applications, where mostly the time parameter is discrete. Variants appear in the 
literature under different names, including sequential Monte Carlo methods (e.g. in 
[10, 18, 19]), population Monte Carlo algorithms [4, 17, 35, 36], Feynman-Kac particle 
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models [6, 9, 14]), particle filters [1, 3, 7]), etc. Theoretical properties of these Monte 
Carlo methods and, in particular, the asymptotics as N — > oo, have been studied in- 
tensively (mostly in discrete time), see e.g. [5, 9] for an overview, and [6, 27] for more 
recent results. The continuous time case has been investigated in [14, 35, 36]. 

The Markov processes (Xj^ ,P) introduced above are continuous-time analogues of 
a particular type of sequential Monte Carlo samplers which have been introduced and 
studied systematically in [10] (cf. also [7, 11, 26, 33]). One major motivation for the 
use of SMC samplers is the estimation of expectation values with respect to multimodal 
distributions where traditional MCMC methods fail due to metastability problems. The 
processes (X^ ,¥) have the additional property that the underlying generator at time t 
satisfies detailed balance w.r.t fit- In this case, the resulting sequential MCMC methods 
are also related to several multi-level sampling methods, including parallel tempering 
[22, 25, 31] and the equi-energy sampler [29]. The detailed balance condition is not 
necessarily required for applications, but it fixes a clear framework that is the foundation 
for our L p approach developed below. 

It is essentially well-known (see [14]) that if the initial distributions of the Markov 
processes (Xj^ ,F) are the iV-fold products ir N of a probability measure n on S, then 
almost surely, the empirical distributions 

1 N 

^=*E^ (1-8) 
i=i 

and the reweighted empirical distributions 

= exp (- jjV^f >) r,? (1.9) 

converge to the solutions of the equations (1.6) and (1.5) with initial conditions t)q = vq = 
ir, see also Corollary 2.8 below. As a consequence, simulating the Markov process X^ 
with initial distribution yields a Monte Carlo method for approximating sequentially 
the probability measures fH, t > 0, which can be viewed as a combination of Markov 
Chain Monte Carlo and Importance Sampling/Resampling. 

1.3. Quantitative convergence bounds. Our main aim is to quantify more explicitly 
the approximation properties of the particle systems with initial distribution /j,q . There 
is a substantial literature on asymptotic properties of corresponding particle system 
approximations, see e.g. [9, 14, 35] and references therein. In particular, a law of large 
numbers type convergence theorem and a corresponding central limit theorem have been 
established in [12, 14] for a related particle system approximation, cf. also [36]. A 
crucial question for algorithmic applications, however, are quantitative bounds on the 
approximation error 

(f,v?)-(f,»t) (i.io) 

for a given function / : S — > IR and fixed N that incorporate some more explicit control of 
the constants. For example, the dependence of the bounds on the dimension in product 
models is very relevant. 

The central limit theorem in [14] yields bounds for the approximation error (1.10) 
asymptotically as N — > oo (at least for a modified particle system). In [36] corresponding 
non-asymptotic estimates are given but without quantifying the constants. We also refer 
to [6] for some more recent non-asymptotic estimates under strong mixing conditions in 
discrete time. In this respect, several important questions still remain open: 
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• The expression for the asymptotic variance in the central limit theorem derived 
in [14] is not very explicit, as it involves L 2 norms of an associated Feynman-Kac 
semigroup. Methods that allow to bound this expression efficiently in a general 
setup and in concrete models have to be developed. 

• For applications it is crucial to derive more explicit non- asymptotic bounds (i.e. 
bounds for fixed N), because the asymptotic estimates could be misleading when 
only a limited number of particles is available. To the best of our knowledge such 
bounds have been proven so far only under partially restrictive minorization (see 
[40]) or strong mixing conditions involving constants that are not very explicit, 
highly dimension-dependent, and far from optimal. In general, tracking the 
constants in the proof of the CLT in [14] shows that these could be of order up 
to exp J Q osc(H s ) ds, where osc(H s ) := supH s — inf H s stands for the oscillation 
of H s . In nearly all interesting applications this quantity is extremely large. 
Hence although the existing results give useful indications on scope and limits 
of SMC methods, the rigorous verification of a given error bound for a realistic 
number N of particles/replicas is still an open problem in many simple concrete 
models. 

• Dimensional dependence on product spaces is an important issue, cf. [1, 2, 3]. 
Rigorous results about the dependence on the dimension of error bounds for SMC 
methods are still missing, and might be out of reach for the existing techniques. 

It is well-known from the theory of reversible Markov processes that a convergence anal- 
ysis based only on total variation estimates and Dobrushin contraction coefficients is 
possible but it has several drawbacks. In particular, substantial contractivity w.r.t. the 
total variation norm often takes place only after a certain number of steps (cutoff phe- 
nomena, cf. e.g. [15, 16]). This limits the applicability if one is interested in arguments 
based on single or even infinitesimal time steps. Moreover, minorization conditions that 
are often imposed in this context are crude and typically dimension dependent. There- 
fore, in this article we develop the foundations of an alternative approach to establish 
non-asymptotic bounds for the particle system approximations, which enables us to 
prove bounds with a reasonable dependence on the dimension for product models, see 
Example 2 below. The approach we propose is based on a consequent application of LP 
estimates instead of uniform estimates for Feynman-Kac propagators. In [20] (cf. also 
[39]), an L 2 approach has been considered to quantify asymptotic stability properties 
of the Fokker-Planck equation. When studying the error of particle system approxima- 
tions, we are forced to leave the L 2 framework and to work with various LP norms. A 
key tool are the LP estimates for Feynman-Kac propagators that have been derived in 
[21]. 



1.4. Outline. The main results of our work are stated in Section 2. Here we also 
consider examples where the approximation errors can be quantified explicitly. Section 
3 contains the derivation of an explicit formula for the variances of the estimators (/, vj^), 
see Proposition 2.1 below. This is based on martingale arguments developed in [14]. In 
Section 4 we apply the formula to prove Theorem 2.5 below, which is a non-asymptotic 
bound for the variances. Finally, in Section 5 we combine this bound with the results 
from [21] to prove the bounds in Theorems 2.6 and 2.10 below. 
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2. Main results 



To state our results in detail let us consider the Markov process (Xf,P) with initial 
distribution [iff. To derive error bounds for the particle system approximation it is 
convenient to consider at first the error for the Monte Carlo estimates based on the 
reweighted empirical distributions defined in (1.9). Following closely the reasoning 
in [14], we first note that, by a martingale argument, it can be shown that (f,u^) is 
an unbiased estimator of (/, fi t ) for any function / : S — > R and t > 0, and an explicit 
formula for the variance can be given. 

2.1. An expression for the variance. To state the formula for the variance, we 
introduce Feynman-Kac type transition operators q s j related to the dynamics. For 
< s < t < oo and a function / : S — > R, let q s ,tf(x) denote the unique solution of the 
backward equation 



"Q^1s,tf = K£ s Qs,tf ~ H s q s>t f, s £ [0,t], 



(2.1) 



with terminal condition q t>t f = f. It can be shown that q s> tf is also the unique solution 
of the corresponding forward equation 



d 

Qpstf = q s ,t( x t£tf - H t f), t e [s, oo), 



(2.2) 



with initial condition q S;S f = f ■ As a consequence, a probabilistic representation of q St t 
is given by the Feynman-Kac formula 

(q s ,tf)(x) = E S:X [e-^ H "^ dr f(X t )] for all x £ S, (2.3) 

where (Xt)t> s is a time-inhomogeneous Markov process w.r.t. ¥ s>x with generator Ct 
and initial condition X s g. [23], [24]. The next proposition is an 

adaptation of results in [14, §3.3] to our slightly modified setting. 



Proposition 2.1. For any f : S 



and 



E 



\(f,v t N )-(f,»t)\'' 



(f, fit), 

Lv^W + l J\[v s N t (f)} ds, 



where 



(2.4) 



V s N t (f) = - (H a {<utf) 2 ,v?)(l,v?) - (H s ,^)(q s ,f - (q s , t f) 2 ,^) 
+ l -fj \H s {z)-H s {y)\(Qs,tf{z)-qs,tf{y)) 2 ^{dy)^{dz) 

Here and in the following Var M (/) := (f 2 , jj) — (/, /j,} 2 stands for the variance of / with 
respect to the measure /U. Although the reasoning is very close to [14], a complete proof 
of Proposition 2.1 is given in Section 3 below for the reader's convenience. 

Elementary estimates show that the approximation error (1.10) for estimates based 
on the empirical distributions rj^ can be controlled by the variance of estimators based 

N 

on v\ : 

Lemma 2.2. For all functions f : S 

1 21 



E 



I and t > we have 
\{f^)-{f,Ht)\ A <2Var«i>f» + 2||/ - (/, Mt ) || s 2 up Var ((1, ^)) 



(2.5) 
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and 

E [| (/, rtf) - </, Mt) |] < Var ((/, i/f » 1/2 + V2|| / - (/, Mt> IU Var ((1, u t N )) (2.6) 

+ V2Var((/,,f)) 1/2 Var((l,,f)) 1/2 , 

w/iere ||£r|| SU p := sup xeS \g(x)\ for any g : S ->• R. 

The proof is given in Section 5 below. 

Remark 2.3. A very interesting alternative expression for the variance of normalizing 
constants similar to (1,^) in discrete time has recently been derived in [6]. 

2.2. A quantitative variance bound. Let p G [2, oof. Our goal is to prove quantita- 
tive bounds for the approximation errors that hold uniformly for all functions / : S — > R 
with L p norm less than one. Because of Lemma 2.2, the errors can be quantified in terms 
of the variance bounds 

e^:=Bup{E[\(f,i^ r )-(f,f la )\ 2 ] \ /:5->Rs.t. Wfh^) < 1, * G [0, t]} (2.7) 

with p € [2, oo). To efficiently bound the quantities e^' p we apply estimates of L p -L q op- 
erator norms for the operators q Sj t- Corresponding estimates are derived systematically 
in [21]. We first state a general result that bounds the error in terms of the expression 
(2.11) and appropriate operator norms, see Theorem 2.5 below. 
For p, q G [2, oo] with p < q, let us consider the operator norms 

n I s \\ls,tf\\LP(ij, s ) 

C s ,t(p) ■= sup^^ , 

C s ,t{P> Q) '■= SU P — iTTii V sup -— V 1, 

/#0 WJWlpQh) /#0 \\J\\LP/Hp t ) 

where r € [p, oo] is chosen such that p~ l = q" 1 + r _1 . Moreover, for 5 > 0, we set 
C t (p,q,S) := sup / \\H s \\ Lq ^C SjT (p,q) ds. 

r6[0,t] JO 

We fix a constant to > 0, and set 

uj := sup osc(H s ), (2.8) 
se[o,t ] 

where osc(/) := sup / — inf/. Since H s = —-j^log n s , the constant u controls the 
logarithmic time change rate of the measures fif Note that 

Ct(p,q,S) < t lo sup [C S:T (p, q) 2 I s,t G [0,t] s.t. t > s + 5}. 

Remark 2.4. Since we assume that the state space is finite, all the constants are finite, 
but their numerical values can be very large. It is a straightforward consequence of the 
forward equation (2.2) that 

fJ- s q s ,t = fr, < s < t, (2.9) 

and hence C s> t(l) = 1. On the other hand, in contrast to Markov transition operators 
which are contractions on L°°, the constants C Si t(oo) can be extremely large in typical 
applications. Therefore bounds on C s j{p) are very sensitive to the choice of p, see [21] 
for details. The constants C s> t(p,q) and Ct(p,q,6) are related to hyperbound properties 
and can only be expected to be bounded in a feasible way if t — s and 6, respectively, 
are not too small. 
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For a function / : S — > R, set 

V 8tt (f) : = -(H s (q s 4) 2 ,fi s ) + jj \H s (x)\(q s , t f(y) - q s , t f(x)) 2 fi s (dx) a s (dy). (2.10) 

Our first main result shows that for p > 4 the asymptotic (as N — > oo) variance of the 
estimator (/, v^) is bounded from above by 

))■ (2-11) 



iV- 1 (Var Att (/) + J*V,, t (f)d3 + 



and, more importantly, it gives a non-asymptotic bound for the mean square error 
Var ((/, v^)) of the same order: 

Theorem 2.5. Fix q g]6, oo] and p Let N G N be such that 

N>25 max (2, C to (p, g, 5), C\, (p, ?, 5)) , 
where p is defined by p~ l = q^ 1 + (p/2) -1 and 5 := (17w) _1 . Then, for t G [0, to]; 

iVEfK/,^)-^,^)! 2 ] < Var Mt (/)+ fv s ,t(f)ds 

Jo 



+ 



l + 7C t {p,q,5)e 



N,p 



In particular, 
where 



et' p < (2 + ^CpM- 1 (1 + 10C t (p, ^iv- 1 ) 

IoVs,Af)ds 



(2.12) 



(2.13) 



■ut(p) := sup sup ■ 
re[o,t] fjto 



2 



The proof is given in Section 4 below. To apply Theorem 2.5 we need bounds for 
the constants vt(p) and Ct(p, q, 5). We will now discuss how to derive such bounds from 
Poincare and logarithmic Sobolev inequalities in the following particular cases: 

a) The Markov processes with generators Ct, t > 0, have "good" global mixing 
properties (see §2.3). 

b) The state space S can be decomposed into disjoint subsets Si, i G I, such that 
Ct(x,y) = for all t > 0, x G Si and y G Sj with i / j, and "good" mixing 
properties hold on each of the subsets Si (see §2.5). 

2.3. Non-asymptotic bounds from global Poincare and log Sobolev inequali- 
ties. For t > and q G [1, oo] let us define 



(q) = f 
Jo 



\ H s\\lv(ij,s) ds - 



The quantities K t (q) are a way to control how much the measures fj, s change for s G [0, t]. 
A rough estimate yields 



v t {p) < 5K t (2) sup (CV(4) 2 | < s < r < t} 



for any p > 4, 



(2.14) 



C t (p,q,S) < K t (q) sup {C s , T (p,q) 2 \ < s < s + 5 < r < t} for any q > p > 1. (2.15) 

Hence estimates for vt{p) and Ct(p,q,5) follow from appropriate L p -L q bounds for the 
Feynman-Kac propagators q St t. In [21], we derive such bounds systematically from 
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Poincare and logarithmic Sobolev inequalities. To apply these results let us define the 
weighted Poincare and log Sobolev constants 

$H t pdm 



A t := sup 



B t := sup 



feS £t(f) 
jH t fdfi 



2 



/e<So 

If log |/| rf/^t 
7t ■■= sup — , 

where S = {/ : S -> M| (f,n t ) = 0, / ^ 0}, S 1 = {/ : S -> R| </ 2 ,^> = 1, / # l}, 
and 

W) = - / /A/d/xt = ^ ^(/(y)-/(x)) 2 £ t (x,y)^(x) 

x,y£S 

denotes the Dirichlet form of the self-adjoint operator £ t on L 2 (S,fit)- We refer to [37] 
for background on Poincare and logarithmic Sobolev inequalities and their applications 
to estimate LP contractivity properties of transition semigroups and mixing times of 
reversible time-homogeneous Markov chains. In [21] we apply similar techniques to 
derive LP-L q bounds for Feynman-Kac propagators. We show that C St t(p) and C Si t(p,q) 
are small (in particular less than 2) if the intensities A s , < s < t, of MCMC moves are 
sufficiently large in terms of the constants A s , B s and 7 S , respectively. By combining 
these results with Theorem 2.5 we obtain: 

Theorem 2.6. Fix to>0,q G]6,oo[ and p ^]jZ2,q[- Suppose that 

N > 40 m&x(K to (q),l), and (2.16) 

\ s > max + Pijp ± 3) t B s , ^o(p, q) lj^ for all s G [0, to], (2.17) 

where uj is defined by (2.8) and 

'2r- 1 2p- 2 p- 1 2p 



a (p, q) := i°g max 



p — 1 p — 2 p — 1 p — 2 

with p and r determined hyp -1 = q~ x -\-2p~ 1 and p^ 1 = q^ + r^ 1 . Then, fort G [0,to], 

ef' p < (2 + 8^(2))Af- 1 (l + 16K t (g)iV- 1 ). (2.18) 

Note that the assumptions on p and q guarantee that p > 2, so that a(p, q) is finite. 
The proof of the theorem is given in Section 5 below. 

Remark 2.7. (i) The theorem shows that if the intensities X s are large enough, then 
already a limited number of particles/replicas suffices to obtain reasonable error bounds. 
In particular, if (2.17) holds, then, by (2.18), a number 

N ^ 3 + WK t (q) 
a 

of particles guarantees e^' p < a for a given a G]0, l/8[. In particular, as a — > 0, a 
number of particles of order 0(K t (q)/a) is sufficient to bound the error by a. 

(ii) Rough bounds for the constants K t (q), A t and B t for t G [0, to] are given by 
K t (q)<tu, A t < Cf oi max # t " , B t < Cf oi Var Mt (H t ), 
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where oj is defined by (2.8) and 

r Poi - sun f f2dflt 
fes tt(J) 

denotes the Poincare constant, i.e., the inverse spectral gap of the generator Ct- There- 
fore, assumptions (2.16) and (2.17) in Theorem 2.6 are satisfied if 

N > 40max(t w, 1) (2.19) 

and 

X s > max{^(maxH; +t {p + 3)Y 3 ,r fls {H s ))Cf oi ,^a{p,q)L0 Ts y (2.20) 

Theorems 2.5 and 2.6 provide non-asymptotic bounds on the variances of the Monte 
Carlo estimators (/, z^) that hold uniformly over all functions / G L p (fi t )- One can 
combine these bounds with (2.12) and (2.6) to obtain more precise non-asymptotic error 
bounds for the Monte Carlo estimators (/, v^) and (/, rj^} for a fixed function /: 

Corollary 2.8. Suppose that the assumptions of Theorem 2.6 hold, and let f G LP (fit)- 
Then 

N^[\{f,^)-{f,^)\ 2 ] < Var Mt (/)+ f V s , t (f) ds + ||/||| p(Mt) + R(t, N) \\f\\ 2 LP(pt) , 

Ny 2 E[\(f,rf)-(f,^)\] < (Var IH (f) + J q V 8>t (f) ds + \\f - (/, »t)\\ 2 LP{lXt) 

+ R(t,N)\\f-(f, fH )\\ Bap 

with explicit constants R(t,N) of order 0(N~ 1 ) and R(t,N) of order 0(N^ 1 / 2 ). 

The proof is given in Section 5 below. 

2.4. Scope and Examples. Summarizing our results, we make the following observa- 
tions: the derived error bounds of a given size for the particle system approximation rely 
on the following quantities: 

(i) A uniform upper bound on the oscillations of the logarithmic time derivatives 

(ii) A minimal intensity Xt of MCMC moves. A lower bound for the required intensity 
can be given in terms of the constants A t , B t and 7t, or alternatively in terms of 
oj, Cf oi and jf 

(iii) A minimal number of particles. On a time interval of length to, a number of 
particles of order O{ujtoa~ l ) is sufficient to bound the error s tQ ' p by a (provided 
Xt is large enough). 

We now illustrate range and limits of applicability of the results in two examples. The 
first is a simple one-dimensional example, while the second discusses the dimensional 
dependence of the estimates in the case of product measures. 

Example 1. Moving Gaussians — one dimensional case. Suppose that S = {a, a+ 
1, . . . , a + A — 1} for some a G Z and A G N, and (/J-t)t>o are probability measures on S 
such that 

/ (x-m t ) 2 \ 
/i((x) « exp |^ J, x G S. 

We assume that t >->■ mt and t ^ at are continuously differentiate functions such that 
at G]0, oo[ and m t G [a, a + A — 1] for all t > 0. Moreover, we assume that the Markov 
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chain moves are given by a Random Walk Metropolis dynamics (in continuous time), 
that is, 

'^in(g|g,l), if|y-x| = l, 
0, if \y — x\ > 1. 

In this case, the following upper bounds for Cf m and 7< hold (see the Appendix): 



Cf oi < 30((a t A A) V2f (2.21) 



A 2 



7f< 300 A +300((a f AA) V2) 2 logA (2.22) 



It can be shown that the upper bound for Cf° l is of the correct order in a t and A. 
The upper bound for 7 t could be improved, but jt is always bounded from below by a 
positive multiple of (A/ at) 2 - Our results can be applied in the following way. For t > 
and x, y £ S we have 

\2 („, ™ \2, 



H t (x)-H t (y) 



_ d_ ( {x- m t ) 2 _ (y - m t ) 



dt V a 2 cr 

y't ( x - y)( x + y - 2m t) , x-y 

a t a 2 a 2 



( W'A \m' t \\ A 2 
V a t A , w< 



\ A^ 

W. (2.23) 

/ (7/ 



Therefore, if we choose the time scale in such a way that the condition 



2 ^f + A ~% V ^t°^o] (2-24) 

is satisfied, then 

co = sup osc(-fTj) < 1. 

te[o,t ] 

Condition (2.24) is an upper bound on the relative change rates of the parameters 
at and m t . Note that if A is large compared to at, then only small change rates are 
possible. The reason is that in this case the Gaussian measure fit changes too rapidly in 
the tails, so that our arguments break down. 

Assuming (2.24), Theorem 2.6 and Remark 2.7(iii) imply that 

ef > p < (2 + 8t)N-\l + leiV" 1 ), 

provided > 40(to V 1), and (2.20) holds with oo = 1, m&xH~ and Var Ms (i7 s ) bounded 
by uj = 1, and Cf oi , 7 S replaced by the upper bounds in (2.21), (2.22). If (a t A 1)/A is 
not too small, this yields reasonably sized (although far from optimal) lower bounds on 
At and N. On the other hand, if at/ A — > 0, then the upper bounds in both (2.23) and 
(2.22) degenerate drastically. 

Example 2. Product measures — dependence on the dimension. In our second 
example we study the dependence of (2.19), (2.20) and (2.18) on the dimension in the 
case when the evolving measures are all product measures. Suppose that 

d d 

s=ns t , Mt =(g)^, 

i=i i=i 
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with probability measures fif\ t > 0, i = 1, . . . , d, on finite sets Si such that 1 1— >■ /i^(x) 
is continuously differentiable and strictly positive for all 1 < i < d and x £ Si. In this 
case one has 

d 



H t (x) = Y J H? ) (x, i ) 



(i) 

where H t and H\ denote the negative logarithmic time derivatives of the measures fit 
and fi\ % \ respectively. If we assume 

osc(F t W ) < 1 Vi £ [0,t ], i = 1,. . . ,d, 

then 

w = sup osc(iTt) < d, (2.25) 



te[o,t ] 



and 



Now suppose that 



d 

Var Mt ( J ff t ) = ^Var i[{l) ( J fff ) )<d. 



=1 



£t(^,y) = ^2c ( t\x 



,Vi) 



=1 



for generators C\ , t > 0, i = 1, . . . , d, of time-inhomogeneous Markov processes on S'j, 

i.e. £t is the generator of the product dynamics on S with component generators C\ . 
It is well known that Ct satisfies Poincare and logarithmic Sobolev inequalities with 
constants 

CT = max C f 7+= max 7V, 

1 i=l,...,d J i=l,...,d 

respectively, where cf m '^ and 7^ are the Poincare and logarithmic Sobolev constants 

for the generators df\ In particular, if the component generators satisfy Poincare 
and logarithmic Sobolev inequalities with constants independent of i, then Ct satisfies 
the corresponding inequalities with the same constants - independently of the dimension 
d. Therefore, in this case, the values of N and A s required to satisfy conditions (2.19) and 
(2.20) are of order O(d). Hence both the number of particles/replicas and the intensity of 
MCMC moves required are of order 0(d). Since simulating from the product dynamics 
also requires 0(d) steps, the total effort to keep track of the evolving product measures 
up to a given precision is of order 0(d 3 ). 

Remark 2.9 (Independent particles). We compare briefly with the particle dynamics 
without importance sampling/resampling, i.e., when the second summand is omitted in 
the definition (1.7) of the generator . In this case, the particles/replicas move in- 
dependently according to the time-inhomogeneous Markovian dynamics with generators 
£t,t>0. Hence the positions of the particles at time t are independent random variables 
with distribution p, t = HoPo,t, where p S; t, < s < t, is the time-inhomogeneous transition 
function. A corresponding discrete-time dynamics is used for example in the classical 
simulated annealing algorithm (see e.g. [13, 30]). Since in general jit 7^ fit, the empirical 
distribution of the independent particle system is an asymptotically biased estimator for 
fit- However, under strong mixing conditions as imposed above, the difference between 
fit and fit, and hence the asymptotic bias, will be small. Therefore it is possible that, 
for fixed N , the empirical distribution of the independent particles process is a better 
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estimate for fit than r\f ' . On the other hand, if the mixing properties break down, the 
bias of the independent particles estimator will not be small, whereas the empirical mea- 
sures vf and r$ may still be suitable estimators. This will be demonstrated now in a 
particular case. 

2.5. Non-asymptotic bounds from local estimates. With suitable modifications 
the above analysis can also be applied to derive bounds when good mixing properties 
hold only locally. As an illustration, we consider another extreme case in which the state 
space is decomposed into several components that are not connected by the underlying 
Markovian dynamics. Suppose that 



S = (JS 



11 

iei 



is a decomposition of S into disjoint non-empty subsets Si, i £ I, such that 

Ct(x, y) = for any t > 0, x £ Si and y £ Sj with i / j. 

Let fi\ := fit('\Si) denote the measure fit conditioned by Si. Then we can apply the 
arguments above with the LP norm replaced by the stronger norm 



LP( Pt ) mjf \\J \\LP{Si,tiy 



iei 

Since Holder's inequality and related estimates hold for these modified L p norms as well, 

H 



the assertion of Theorem 2.5 still remains true if ef' p is replaced by 



ef < p := sup {E [|<i>f > - </,^>| 2 ] | / : S -> R s.t. WfW^ < 1, s € [0,t]} , 

and the constants C s j(p,q) and Ct(p,q,S) are defined w.r.t. the modified L' p and L q 
norms as well. Moreover, the representations (1.2) and (1.3) hold for \x\ in place of fj, t if 
H t is replaced by 

H\ := H t -(H t ,ri). 

Let A\, B\ and jI denote the Poincare and logarithmic Sobolev constants defined as 
above but with S, fit and H t replaced by Si, fi\ and B.\, respectively. Let us also set 

A t := max A\, B t := max B\, % := max 7^, 

iei i&I iei 

K t {q) ■= J II^IIlb^,)^, and 

M t := max sup — — — ■ . 

0<r<s<t flr{Si) 

Then, by estimating LP norms separately on each component, we can prove the following 
extension of Theorem 2.6: 

Theorem 2.10. Fix to > 0, q g]6, 00 [ and p Suppose that 

N > 40max(K k) (q), 1), and 

(pA s p(p + 3) ~ 17 \ 
H 1 t B s , —a(p, q)u^ s j for all s G [0, t \. (2.26) 

Then, fort G [0,to], one has 

et' p < (2 + %K t {2)M?) N^ 1 (1 + !6K t (q)Mt N' 1 ). 
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Remark 2.11. (i) If there is only one component, the assertion of Theorem 2.10 reduces 
to that of Theorem 2.6. 

(ii) Error bounds for the estimators (/, u^} and (/, r]^) for a fixed function / hold 
analogously to Corollary 2.8. 

2.6. Open problems. 1) The cases discussed in Sections 2.3 and 2.5 are extreme cases. 
In many typical applications, one would expect the state space to split up as time 
evolves into more and more components that get almost disconnected by the dynamics 
(local modes, metastable states). The study of such more complicated situations is an 
important topic for future research. 

2) We have discussed here a setup with discrete state space and continuous time. In 
continuous time, particle systems on more general state spaces can in principle be treated 
by similar techniques, although of course additional technical considerations are required 
(cf. for instance [36]). For algorithmic applications, the case of discrete time and a 
continuous state space is probably the most interesting one. For an overview of the 
substantial literature and some more recent results in this case we refer to [2, 5, 8, 9, 14, 
10, 18, 27] and references therein. An L p approach similar to the one presented here is 
developed for the discrete time case in the PhD thesis of N. Schweizer [38]. 

3. Variances of weighted empirical averages 

In this section we will prove Proposition 2.1, which shows that (/, u^} is an unbiased 
estimator for (/, and gives an explicit formula for the variance. The proof follows 
the arguments developed in [14] relying on the identification of appropriate martingales. 

Recall that the carre du champ (square field) operator associated to Cf is defined 
for functions ip : S N — > R by 

If (v) = Cf J - 2pC?<p, 

i.e., 

If (¥>)(*) = Y,C?(x,y){<p{y)-<p{x)) 2 Vx€S N . (3.1) 
yes 

It is well-known that the processes 

M? = p(t,X t N )-p(0,X^)- j\^-+CfMs,Xf)ds, and (3.2) 

N? = {Mff-fvf{p{s,.)){Xf)ds (3.3) 
J o 

are martingales w.r.t. the filtration induced by the process Af for any function ip : 
R + x S N — > R that is twice continuously differentiable in the first variable, cf. e.g. [28, 
Appendix 1, Lemma 5.1]. For x G S N let 

1 N 

i=i 

denote the corresponding empirical average. In the next lemma we derive expressions 
for Cf and Tf acting on linear functions on S N of the form 

N 

<p f (x) = (f, V (x)) = iV^ 1 ^/^). 

i=i 



APPROXIMATIONS OF EVOLVING PROBABILITY MEASURES 15 
Lemma 3.1. For any function f : S — > M. and t > 0, one has 

£?(f,r,) = \ t (C t f,r ! ) + (H t ,r ! )(f,r ! )-(H t f,r ! ) 

and 

If «/,»>» = ^(r t (/),r ? ) + l J J (H t (y)-H t (z)) + (f(z)-f(y)) 2 n(dy)n(dz), 

where T t denotes the carre du champ operator w.r.t. Ct- 
Proof. The definition of immediately yields 
\ N i N 



N ^ LJy " N 2 

i=l i,j=l 



Moreover, 

N 



^( J ff t (x J )-^(x,)) + (/(x J )-/(x i )) 



«J=1 



(H t (xi)-H t (xMf(xj)-f(xi)) 

i,j:H t (xi)>H t (xj) 

E ( H fci) - Ht(xi))(f(xi) - f( Xj )) 

i,j:Ht(xj)>Ht(xi) 

(HtM-HtixMfixA-fixi)) 

i,j:Ht(xj)>H t (xi) 
N 

= -J2 ~ H t (x 3 ))-{f{x 3 ) - f( Xi )), 

and hence 

N N 

(H t (xi) - H t {xj)){f{ Xj ) - f{ Xi )) = 2 ( H t{xi) - H t {x 3 )) + {f{x 3 ) - f{ Xl )). 

i,j=l i,3=l 

Therefore the second term on the right hand side of (3.4) is equal to 
1 N 

^ (H t ( Xi ) - H t (x 3 ))(f(x 3 ) - f( Xi )) 

N , N N 



Or E **(**)) E /(^)) - jf E H ^)^ 



i=l 3=1 

{H t ,r,(x))(f, V (x))-(H t f,r,(x)), 



from which the first claim follows. 
Furthermore, since 



(fM^ J ))-(fMx)) = N- 1 (f(x 3 )-f(x t )), 
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(3.1) and (1.7) imply 

A N 

rf </,i7>(x) = ^XXA(*i,v)(/(v) -/(^)) 2 

i=l 

1 N 

from which the second claim follows noting that the first term on the right hand side of 
the previous expression is equal to 

A N \ 

^^r,(/)(x 4 ) = £<r t (/), „(*)>. □ 



Now let us define 



N 



i=i 



As a consequence of Lemma 3.1 we obtain: 

Proposition 3.2. The processes m£ and n£, u 6 [0,i], defined by 

Jo 

R£ = (Ml) 2 -±£\ s {r s ( qs , t f),r,?)ds 

~Jf£l I - H s {z)) + {q s ,tf(z) - q s , t f(y)f^(dy)^(dz)ds 

are martingales w.r.t. the filtration Ft = o~(X^ \ s € [0, tj). 
Proof. Note that A{ = <p(s,X?), where 

N 

<p(s,x) = N' 1 ^2q st f{x i ). 
i=i 

By the backward equation (2.1), 

d A N 1 N 

—<p(s,x) = -^J2C s q st f(xi) + —J2H s q st f(xi) 

i=l i=l 

= -X s (C s q st f, r)(x)} + (H s q st f, rj{x)), 

and by lemma 3.1, 

(£f ip) (s, x) = \ 8 (£ s q s , t f, r}{x)) + (H s , r}(x))(q Stt f, rj(x)) - (H 8 q s>t f, <q{x)) 

Hence 

+ C^jtp(s,x) = {H s ,rj(x))(q Stt f,ri(x)), 
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which proves that M* = M v is a martingale, cf. (3.2). Similarly, by Lemma 3.1, 

if(vO(«,*) = ^(r s (q s ,tf)Mx)) 

+^ JJ (H s (y)-H s (z)) + ( qs , t f(z)- qs , t f(y)) 2 ^(dy)^(dz), 

which proves that = is a martingale, cf. (3.3). □ 

Since in general, A* s t is not a martingale, (/, rj^) is not an unbiased estimator for 
(/, lit)- This motivates considering (/, v^) instead. Let 

Proposition 3.3. The process A* ut , u € [0,t], is a martingale with increasing process 
given by 

+ ^ [II ( Hs{x) ~ ^ s(y) ) + ^'* /(y) " V^fW) 2 v s{dx) uf{dy) ds. 

Proof. By the integration by parts formula for Stieltjes integrals and Proposition 3.2, 
we get 

JO JO 
JO 

Hence [0, t] 9 m A{ t is a martingale whose increasing process can be written as 

{A{ t ) u = / e- 2 fi< H " r i?) dr d{M f ) 9 . 
Jo 

The result now follows by Proposition 3.2 and Equation (1.9). □ 

The purpose of the next lemma is to obtain an alternative representation (modulo 
martingale terms) of the term involving the carre du champ operator in the expression 

for(< t ). 

Lemma 3.4. The following decomposition holds: 

[ U \ s {l,v?){r s (q st f),v?)ds 
Jo 

= M„ + (1^)(MV:)+ (H s ,v?)((q st f) 2 ,v?)ds 

Jo 

- [ U {l,^){H s (q st f) 2 ,^)ds, 
Jo 

where M is a martingale. 
Proof. Let 

Y u := <l,^><(^/) 2 ,^> = e- 2 Io(^)dr {{qutf) 2 :rj N } _ 
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By applying the martingale problem to the functions <p(s, x) = ((q s tf) 2 , v{ x ))i we obtain 

Jo 

Here and in the following we write Y u ~ Z u if the processes Y u and Z u differ only by a 
martingale term. Proceeding as in the proof of proposition 3.2, we get that 



d_ 

ds 



<p(s, X. 



d 



2\ s (q st fC s q st f^) + 2(H s (q st fy,n»), 



2 „N\ 



and 



C^(s,X?) = \ a (C 8 { qat fY,Tfi) + (H s ^)(( qst fy^:) - (H s (q st fy,n»). 



^ 2 ^ 



\2 „iV\ 



Recalling that C s (q st f) 2 - 2q st fC s q st f = F s (q Sjt f) and = exp(- (H r , u^)dr) ??f , 
we conclude 

(l,^)((q ut f) 2 ,^) ~ -/ (H s ,v?)((q st f) 2 ,v?)ds 

Jo 

+ f\l,vf){H s {q st f) 2 ,vf)ds + [ U X s (l,^)(T s (q st f),^)ds, 
Jo Jo 

which proves the assertion. □ 
Lemma 3.5. For all t > 0, 



E 



(H s ,v?)(q st f 2 ,v?)ds 



uo 



Proof. By the product rule for Stieltjes integrals, 



f 

Jo 



s,t 



{H u ,^)A{,du. 



Since s >-¥ A^ s t is a martingale, 

E[(l,^)</Vf>] = (g , t / 2 ,M)-E[ f(H„vZ)A£ t 

Jo 

The proof is completed by noting that (qo,tf 2 , Mo) = (/ 2 > Mt)- 



□ 



Proof of Proposition 2.1. Fix a function / : S — > R and £ > 0. Recalling that, by (2.9), 
</, A*t> = (Qo,tf,tM)), we have 



(/, ^f) - (/, Mt ) = (ft,*/, i/f) - <go >t /, <) + (q 0>t f, 



.AT\ 



(«>,*/, Mo) 



= A f 



A o,t + (Qo,tf,vo) ~ (qo,tf,H>)- 



t.t 



Taking expectations on both sides, we immediately obtain 

E [</,"">] = </,A*t>, 

because s i-» A^/ is a martingale by Proposition 3.3, and is the empirical distribution 
of N i.i.d. random variables with distribution jjlq. Moreover, by Proposition 3.3 and 
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Lemma 3.4, 



NE 



\(f,u?)-{f,IH)f 
f 



««>,*/, ^0 ) - (oo,tf,tMi)y 



= NE[(Al t -A{ t ) 2 \ +NE 

NE[{Ai tt ) t \ + Var /io (g , t /) 

= E[(l,^)(/ 2 ,^)-((g , t /) 2 ,^)] +Var M (g ,t/) 

+E f(H s ^)({q st f) 2 ^)ds - E [\l,v?){H s (q st f) 2 ,v?)ds 
Jo Jo 

+®f jj (H(x) ~ H(y)) + {q^f(y) - q s , t f(x)) 2 u?(dx)v»(dy)ds. 
The assertion now follows from Lemma 3.5 observing that 

E ((Qo,tf) 2 ,^)} + Var w ( ?0 ,t/) = -<(go,t/) 2 , Mo) + Var M ( 9o ,t/) 

= -(qo,tf,vo) 2 = -(f,vt) 2 - 



□ 



4. Proof of Theorem 2.5 
Proposition 4.1. Xeip, g,r£ [l,oo] be such thatp^ 1 = q~ 1 + r~ 1 . Then, forO < s <t, 

E [V s N t (f)] <v.m 

+ {ms\\L^ s) \\q s ,tf\\hr M + \\H s \\ LP{ „jq s , t f 2 \\ LP{l , s) )e^. 

Proof. Since (f,v^) and {g,v^) are unbiased estimators of (/,/x s ) and (g,/j, s ), respec- 
tively, we have, by the Cauchy-Schwarz inequality, 

\n(f,v?)(g,v?)]-(f,i*s){g,n.)\ 

= \E[{{f,v?)-{f,n.)){(g,v* r )-{g,n.))]\ , ^ 

(4 1) 

< (E|(/,,f)-( / ,^)| 2 ) 1/2 (E|( 5 ,,f)-( 5 , Ms )| 2 ) 1/2 

<e^\\f\\ LP{ , s) \\g\\ LP{ , s) 

for all < s < t and all functions /, g : S — > R. Since the last term on the right-hand 
side of (2.4) can be bounded by 

j I \H s {y)\{q s , t f{z) - q s , t f(y)) 2 ^(dz)^(dy), 

an application of (4.1) yields, by (1.3) and (2.10), 

E[V$(/)] < -<F s (g s , t /) 2 ,/x s )<l,/z s ) - (H s , f i s )(q s 4 2 -(q s 4) 2 , f i s ) 

+ fj \H s (y)\(q Stt f(z)-q s , t f(y)) 2 f i s (dz)^(dy) +e?*R., t (f) 
= V a>t (f) + e?*R a , t (f), 
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where 

Rs,t(f) = \\ H s{qs,tf) 2 \\LP(ns) + \\ H s\\LP(^ s )\\Qs,tf 2 - {(ls,tf) 2 \\ LP ^ s) 

+ ll^slU^/^llfe,*/) 2 !!^^) + 2 \\ H sq S ,tf\\LP(^ s )\\Qs,tf\\LP(p s ) 
+ \\ H s(Qs,tf) 2 \\ LP(pa) 
< \\ H s\\LP( Ms )\\Qs,tf 2 \\LP( Ms ) + ^\\H s \\ Lq ^ s )\\q s%t f\\ 2 L 2r^ s y □ 

In order to bound V s N t (f) uniformly over / G L p (/j, t ) with H/Hlp^) < 1, one needs to 
be able to control \\q s ,tf\\L 2r (nt) m terms °f II/IIlp^)- This is possible if hypercontrac- 
tivity holds and t — s is sufficiently large. Over short time intervals [s, t] we apply in a 
first step another rough estimate instead: 

Lemma 4.2. Let p > 2 and N £ N. Then forO<s<t, 

[*$(/)] <4osc(tf s )(l + ef^exp(2 f osc{H r ) dr)) 

J S 

Proof. Setting 

A{ := (f,v?) = (/,r / f)exp(- J\h 9 ,t,?) ds), 
we have ^/ = (/, r/f) for all / : S -»• R. Since 

1 N 1 N 2 



i=i i=i 



we obtain, recalling that 7^ is a probability measure, 

V s N t (f) < N(Al) 2 ((maxff; + max J ff+)(|^/|,r ? f > 2 + maxi^^/ 2 ) 1 / 2 ,^ ) 2 

+ 2osc(F s )(|^/|,r ? f) 2 ) 

< TV osc(^) (3( 9s , t |/|, ^f) 2 + ((q s ,tf 2 ) 1/2 , ^f) 2 ) • (4-2) 
Moreover, by inequality (4.1), 

E[(/,^) 2 ] <(/,^) 2 + £ f' P ||/|lL (Mt) , 
hence, taking expectations on both sides of (4.2), we obtain 

1e[^(/)] <3osc(^)[(g s , t |/|,^) 2 + £^||g s , t |/||| 2 p(Ats) ] 

+ osc(F s )[(g s , t / 2 , Ms )+e^||g s , t / 2 || 2 p/2( ^ ) ] 



< 4osc(F s ) 



(/ 2 ,^)+ef'fexp(2 f osc{H r )dr) 

J S 



LP(pt) 



where we have used the fact that {q s ,tf, Ms) = (/, Mt), and the estimate 

hs,tf\\ LP{lXt) < ex P^ oac{H r )drj \\f\\ LP ^ s) . (4.3) 
The proof of (4.3) is elementary and can be found in [21]. □ 
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Combining Proposition 4.1 and Lemma 4.2 we obtain the following (rough) a priori 
estimate: 

Lemma 4.3. Let p, q, r € [2, oo] be such that p~ l = q" 1 + r" 1 , and choose 5 as in 
Theorem 2.5. If 

N>25 max(l,C t (p,q,8)) 

then 

e?* < 1. 

Proof. Note that, by (2.10), 

V a , t (f) < 5\\H s \\ Lq ^ s) \\q a ,tf\\L*rfr a ) 
for any / : S — > K and < s < t. Hence Proposition 4.1 implies 

E[V#(/)] < ||^|| L ,( Ats )C s , t (p,g) 2 ||/||| P( ^ ) (5 + 7e^). 
Choosing N as stated we get 

^ J^ 5)+ m N df )} ds < Ill/IIL^) max (ef- l). 
On the other hand, by Lemma 4.2 and since 175 osc(H s ) < 1 for any s < t, we obtain 

* < ^(l + ^ 2/17 )ll/HL(, t ) 

<^i/iiW t ) max ( £ ^ 1 )- 



Hence by Proposition 2.1, since N > 50, we get 



sup {1 Var M( (/) + 1 j\[V s N t {f)] ds\f:S^R with ||/|| LP(Mr) < 1, r € [0,t]} 

< fl + - + lVax(ef' p ,l). □ 
V 50 25 2 J v * ' ' 

The a priori estimate just obtained can be used instead of Lemma 4.2 to estimate 
E [V s N t (f)] when t - s is small: 

Lemma 4.4. Let q e]6, oo] and p G]4<7/(<7 — 2), oo[. Suppose that 

N>25mex.(l,C t (p,q,8)), 
where p is defined by p~ l = g _1 + (p/2) -1 . T/ien /or < s < t < t , 



E [?£(/)] < F S)t (/) + 7 exp ^2 ^' osc(# r ) dr) ||# s || i<; 



(lis)\\J\\LP(fH)- 



Proof. Note that p 1 = q 1 + (p/2) 1 < 1/2 by the assumptions on p and q. Applying 
Proposition 4.1 with p, q, r replaced by p, q := q, and f := p/2, respectively, yields 

Since p < min(g,p/2), the claim follows by Lemma 4.3 and the estimate (4.3). □ 
We are now ready to prove the theorem: 
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Proof of Theorem 2.5. By Proposition 4.1 we have 

E[l$(/)] < V s , t (f) + 7||F s || L , (/ls) C s , t (p, 9 ) 2 ||/||| p(/i()£ ^ 

for any / : 5 — >■ R and < s < i. Therefore by Proposition 2.1, Lemma 4.4, and the 
choice of 5, 

~(t-S)+ r t 



iVE|(/,^)-(/,^)| 2 = Var^(/)+ f E[V s N t (f)] ds + f E[V s N t (f)] ds 

JO J(t-S)+ 

< Var Mt (/)+ fv s ,t{f)ds 
Jo 

+ \7C t (p, q, 5)e?' p + 7e 2 ' 17 [ \\H s \\ Lq{fls) ds 
1 J(t-s)+ J 



l(t-S)+ J V ' 

Observing that \\H s \\ Lq ^ s ) < osc(H s ) and that 7e 2 / 17 /17 < 1, we obtain (2.12). 

Furthermore, by maximizing (2.12) over all / : S — > R such that H/Hlp^) < 1 an d 
over t, we get 

Ne?*<2 + v P + 7C t (p,q,S)e?' p 
for all t G [0, to]- Recalling that iV > 25Ct(p, q, S) by assumption, we obtain 
,N, P . 2 + _ (0 , ( 1 , 7C t (p, q, 5) 



~ N-7Ct(p,q,8)- {2 + Vt) \N + 
< (2 + v p ) N- 1 (l + 7 -^C t (p, q, 5)N^), 
which implies (2.13). □ 



5. Proofs of Theorems 2.6 and 2.10 
Proof of Theorem 2.6. By the estimates in [21] we have, for < s < t < to, 

\\Qs,tf\\LP(n s ) 

for all / : S ->■ R, provided 

A s > V - A s + Pk±3. to Bs for all s G [0, i ]. (5.1) 
Hence, under this condition, we get C s j(p) < 2 1 / 4 . Moreover, by [21], 

\\Qt-6,tf\\LiQH-l) - eXp ( / ™* H r dr ) 11/11 LP( Mt ) 

for all / : 5 -»• R and < 5 < t < t , provided 

K > j| lo § ~^~~T ^ all s G [0, i ] • (5.2) 

4:0 P — 1 

Choosing S = (nuj)^ 1 , we obtain that, for s < t — S, 

\\Qs,tf\\LP(n a ) = \\Qs,t-SQt-S,tf\\LP(ti s ) - 2l/4el/17 \\f\\Li( fH ), 
if both (5.1) and (5.2) hold. Hence 

provided (5.1) holds and 

7 S 2r — 1 2p — 2 

As > log max ( , — ) for all s G [0, t ] . 

4o p — 1 p — 2 ' 
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Since 2 < p < p and p 1 = q 1 + (p/2) 1 , we obtain similarly that C Sj t(j5, q) < 2 1 / 4 e 1 / 17 
provided (5.1) holds and 

A s > ^logmax(? — — — ) for all s G [0, to]. 
4o v p — 1 p — 2 7 

Hence by (2.14) and (2.15) we obtain 

vtip) < 5 • 2 1 / 2 tf t (2), C t {p, q, 5) < 2 1 / 2 e 2 ' 17 K t (q), C t (p, q, 5) < 2 1 ' 2 e 2 ' 17 K t (q) 
for any t < tg. The assertion now follows from Theorem 2.5. □ 



Proof of Lemma 2.2. For a function / : S ->• R and t > let f t := f - (f, fit). Then 

(ft,ri?) = (/,«?) ~(f, IH) 

and, by (1.9), 

(f t ,v t N ) = {l,vt f )(f t , V »). 

Hence 

E[(f t , V ?) 2 ] <2E[((/ t ,r ? n-(/ t ,^ JV )) 2 l +2E[(/ t ,^) 2 ] 



= 2E 



((l,u t N ) - l) 2 (f t ,^) 2 ] +2E[(f t ,u 



<2||/ t || s 2 up E[((l,^)-l) 2 ]+2E[(/^f) 2 ]. 

Applying this bound and (5.3), we obtain the L 1 estimate: 

E[|(/,,r / f)|]=E[|(/ t ,r ? i v )(l-(l,^))|]+E[|(/^f)|] 

< E [(f t , V ?) 2 ] 1/2 E[((l, v?) - 1) 2 ] 1/2 + E [(f t , ,f> 2 ] 1/2 
<E[(/ t ,^f) 2 ] 1/2 + V^||/t|UpE 



+ [(f t , ^) 2 ] 1/2 K «1,^>-1) 



1/2 



This proves Lemma 2.2. 



(5.3) 



□ 



Proof of Corollary 2.8. The first assertion is an immediate consequence of (2.12) and 
(2.18). The second assertion follows by the first one and (2.6). □ 

Proof of Theorem 2.10. Fix i e I and define 

h t (i) := (H t ,4) = [ H t dfit/fit(Si). 
J Si 

Note that 

ht(i) = -j t \ogfi t (Si). 

Since (1.2) and (1.3) hold, HI = Ht — ht(i) is the negative logarithmic time derivative of 
fi\. If we define q\ t f for functions / : S{ — > R in the same way as q S: tf with H t replaced 
by H\, then 

q s , t f(x) = exp(- fh r (i)dr)qy(x) = /( x ). 
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In particular, for p G [l,oo], we have 

IWs.tfWXpU.) = m ^hs,tf\\LP(ni) < ~T^\ W qt s,tf\\LP(ni)- (5-4) 

ifci it-/ fj, s yji) 

Assuming Poincare and log Sobolev inequalities with respect to the measures p\ and the 
functions HI, we obtain the same type of L p -L q bounds for the operators q\ t as we did 
for the operators q s j in the proof of Theorem 2.6. Because of (5.4) the assertion then 
follows similarly as above. □ 

Appendix A. Spectral gap and LSI for ID Metropolis 

In this appendix we prove upper bounds for the Poincare and logarithmic Sobolev 
constants for Random Walk Metropolis algorithms on a finite subset S of Z. Let S := 
{a, a + 1, ... , —1, 0, 1, . . . , a + A — 1} with a G Z and A G N such that G S. We assume 
that (i is a probability measure on S satisfying 

(i) p{x) < pfi{y) for any x, y G [s,s]; 

(ii) p(x + 1) < ap{x) for any x > s, and p(x — 1) < ap(x) for any x < —s, 

for appropriate constants s G Z + , p G [1, +oo[, and a G]0, 1[. For notational convenience, 
we set 

6:=a + A-l, r := — - — A A, it := s A A. 

1 — a 

The Random Walk Metropolis chain for sampling from p is the Markov chain on S with 
generator C satisfying 



C(x,y) 



'^(SS' 1 )' if ^" a; l = 1 ' 
k 0, if|y-x|>l. 



To estimate the Poincare constant for this dynamics, we can apply a general upper 
bound for one-dimensional Markov chains due to Miclo [32] , which implies in our case 

C7 Poi < 4 max( J B + ,S~), (A.l) 

where 

k j b 
B + := max B+, B+ := V — — 

Kk<b fe ' k n( x — 1) A p(x) 

x=l ^ V J r\ I x=k 

-1 fc 

£~ := max Br, := — — — — — p(x). 

a<k<-i k k ^ p(x + 1) A a(x) ^— ' 

a;=fc a;=a 

The bound is sharp up to a factor 4, see [32]. We are going to estimate B£ in the cases 
k > s and k < s separately. Corresponding bounds hold for B^. Let us assume first 
that k > s. Then we have, by (ii), 

v - 1 x - 1 1 v - i r 

x £^ +i p( X - 1) a p(x) = Jz 1 iw-m h a 

and, by (i) and (ii), 

1 pu a k ~ s pu 



^ p{x - 1) A p{x) p(s) p{k) 



APPROXIMATIONS OF EVOLVING PROBABILITY MEASURES 



25 



Hence 

k 

S ^-iU(^ (r+at ~Hgy- (A - 2) 

Similarly, by (ii), 

6 b-k 



x=k i=0 

Therefore (A.2) and (A. 3) yield 

B~£ < r(r + a k ~ s pu) < r 2 + pur for any k > s. (A. 4) 

Let us now consider the case k < s: by (i) and since s A b < u, we have 

A; ^ sAf>— 1 fc sAfe— 1 , 

Moreover, similarly to (A. 3), we have 

x=sAb 

hence, by (i) and since k < s and k < A, 

El ^U, u(s A b) 
— — -— > u(x) < r > — — — < pkr < pur. 
a(x - 1) A u(x) ^ ' ~ ^ u(x - 1) A a(x) ~ P ~ P 

x=l ^ v / t~\ / x =sAb x=l ^ y / t~\ / 

Combining these estimates, we obtain 

1 
4 

By (A. 4) and (A. 5), we finally obtain 



B£ < -pu 2 + pur, for any k < s. (A. 5) 



B + := ^max^-B^ < pur + max(r 2 , pu 2 /4). 
Observing that the same estimate holds for B , we have shown: 

Theorem A.l. The Poincare constant C Po1 for the Random Walk Metropolis chain 
with stationary distribution p satisfies 

C Poi < 4pur + max(4r 2 , pu 2 ) 
Proof. The result holds by the upper bound (A.l). □ 

For the corresponding logarithmic Sobolev constant the following upper bound follows 
from the results in [32]: 

7 < 20max(/3+,/r), 

where 

k ^ b b 

/3 + := max /3t, fit := —. -r— — jr-r Y~V(a;) logY"* 

i<k<b k k ^— ' u(x — 1) A u(x) ^— ' I 

x=l x=k x=k 

-1 ^ k k 

(3~ := max /3r, /3r := > — — —} p(x)\log} p(x) 

a<k<-l' k ' ^ k u(x + 1) A Lt(x) '\ ' 

x=k x=a 
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Again, the bound is sharp up to an explicit numerical constant. A rough estimate for 
13% can easily be obtained observing that 

^ ^ — i 1 1 

|logJ^^(s) = log ( /i(rr)) <log— — <log— , 

where /u* = min x ^(x). In fact, this implies 

#<2B+logJ-, 

ft* 

hence upper bounds for (3 + and /3~ can be obtained from the corresponding bounds for 
B + and B~ simply by multiplying by a factor 21og / u~ 1 . In particular, the upper bound 
for C Po1 derived above yields an upper bound for 7: 

Theorem A. 2. One has 

7< 10(4pur + max(pii 2 , 4r 2 )) log — . 

Example: A discrete Gauss model. Assume that 

H(x) oc exp ( - ^ ) 

for some finist constant a > 0. Then one can check that (i) and (ii) above are satisfied 
with 

Ms + i ) ( 

Note that a < e^ 1 / 2 for <r < 1 and a < e~ 3//4cr for a > 1. Applying the elementary 
inequality 1 — e~ x > min(2x/3, 1/2), we obtain 1 — a > 1/(2<t) if a > 1 and 1 — a > 1/3 
if a < 1. Hence 

1 AA< (2ctV3) AA< 2((aAA) V2). 



1 -a 

By Theorem A.l, we then obtain 

C Poi < 30((cj A A) V2) 2 
Moreover, since — A<a<6<A, one has 



MO) 

and thus 



M*0 / ^ 2 \ / ia 2 \ p , „ 

^0) =eXP (-2^)- eXP (-2^j foranyA; G 5, 



1 . 1/. . x2 , 1 „ 1/ A , n2 



log-<-(A/,)^ + log-^<-(A/,)^logA. 
Therefore we obtain, by Theorem A. 2, 

7 < 150 ((<r A A) V 2) 2 (A/o-) 2 + 300 ((<r A A) V 2) 2 log A 
< 300 (^y) 2 + 300 ((cr A A) V 2) 2 log A. 
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